Interleaving a Join Sequence with Semijoins in Distributed Query Processing

نویسندگان

  • Ming-Syan Chen
  • Philip S. Yu
چکیده

In distributed query processing the conventional approach to reduce the amount of data transmission is to rst apply a sequence of semijoins as reducers and then ship the resultant relations to the nal site to carry out the join operations Recently it has been shown that the approach of applying a combination of joins and semijoins as reducers can lead to substantially larger reduction on data transmission required In this paper we develop an e cient heuristic approach to determine an e ective sequence of semijoin and join reducers Semijoins whose execution will reduce the amount of data transmission required to perform a join sequence are termed bene cial semijoins for that join sequence Note that bene cial semijoins include the conventional pro table semijoins and the gainful semijoins that are not pro table themselves but become bene cial due to the inclusion of join reducers This type of dependency between semijoin and join reducers complicates the identi cation of bene cial semijoins and the ordering in the reducer sequence In this paper we rst obtain a sequence of join reducers and map it into a join sequence tree In light of the join sequence tree we derive important properties of bene cial semijoins These properties are then applied to develop an e cient algorithm to determine the bene cial semijoins which can be inserted into the join sequence Examples are also given to illustrate this approach Our results show that the approach of interleaving a join sequence with bene cial semijoins are not only e cient but also e ective in reducing the total amount of data transmission required to process distributed queries Index Terms Distributed query processing gainful semijoins bene cial semijoins join sequence tree reducible set

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Distributed Query Processing in the Internet: Exploring Relation Replication and Network Characteristics

We introduce the concept of network graph for distributed query processing. Semijoins and joins are termed contributive replicated semijoins and contributive replicated joins, respectively, when they are interleaved into a join sequence to reduce the amount of data transmission cost required in a network with replicated relations. Our solution procedure consists of three consecutive steps, name...

متن کامل

A Heuristic Approach to Distributed Query Processing

In a distributed database environment, finding the optimal strategy which fully reduces all relations referenced by a general tree query, may take exponential time. Furthermore, since reduced relations are to be moved to the final site, the optimal strategy which fully reduces all relations does not give an optimal solution to the problem of minimizing the total transmission cost. For a general...

متن کامل

The Enhancement of Semijoin Strategies in Distributed Query Optimization

We investigate the problem of optimizing distributed queries by using semijoins in order to minimize the amount of data communication between sites. The problem is reduced to that of finding an optimal semijoin sequence that locally fully reduces the relations referenced in a general query graph before processing the join operations. The optimization of general queries, in a distributed databas...

متن کامل

Using Remote Joins for the Processing of Distributed Mobile Queries

The query processing in a mobile computing environment involves join processing among different sites which include static servers and mobile computers. In this paper, we first present some unique features of a mobile environment, and then, in light of these features, devise query processing methods for both join and query processing. Remote mobile joins are said to be effectual if they are, wh...

متن کامل

Optimizing Entity Join Queries by Extended Semijoins in a Wide Area Multidatabase Environment

In this paper we consider processing entity join queries in a wide area multidatabase environmen t where the query processing cost is dominated by the cost of data transmission An entity join oper ation integrates tuples representing the same en tities from di erent relations in which inconsistent data may exist The semijoin technique has been successfully used in a distributed database system ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • IEEE Trans. Parallel Distrib. Syst.

دوره 3  شماره 

صفحات  -

تاریخ انتشار 1992